control instruction
See, Think, Act: Teaching Multimodal Agents to Effectively Interact with GUI by Identifying Toggles
Wu, Zongru, Mao, Rui, Tian, Zhiyuan, Cheng, Pengzhou, Ju, Tianjie, Wu, Zheng, Dong, Lingzhong, Sheng, Haiyue, Zhang, Zhuosheng, Liu, Gongshen
The advent of multimodal agents facilitates effective interaction within graphical user interface (GUI), especially in ubiquitous GUI control. However, their inability to reliably execute toggle control instructions remains a key bottleneck. To investigate this, we construct a state control benchmark with binary toggle instructions from public datasets. Evaluations of existing agents demonstrate their unreliability, particularly when the current toggle state already matches the desired state. To address the challenge, we propose State-aware Reasoning (StaR), a training method that teaches agents to perceive the current toggle state, analyze the desired state from the instruction, and act accordingly. Experiments on three multimodal agents demonstrate that StaR can improve toggle instruction execution accuracy by over 30\%. Further evaluations on three public benchmarks show that StaR also enhances general task performance. Finally, evaluations on a dynamic environment highlight the potential of StaR for real-world applications. Code, benchmark, and StaR-enhanced agents are available at https://github.com/ZrW00/StaR.
Prompt-Based Length Controlled Generation with Multiple Control Types
Jie, Renlong, Meng, Xiaojun, Shang, Lifeng, Jiang, Xin, Liu, Qun
Large language models (LLMs) have attracted great attention given their strong performance on a wide range of NLP tasks. In practice, users often expect generated texts to fall within a specific length range, making length controlled generation an important topic, especially for GPT-style models. Existing length control methods mostly focus on a simple control type of "equal to" a target length. Different from them, we propose a prompt-based method to achieve length controlled generation under different control types with high accuracy. In particular, we adopt reinforcement learning (RL) and sample filtering with the reward signal given by rule-based reward models, which enhances the length control ability of models by rewarding outputs that follow certain control instructions. In addition, we introduce a standard prompt extractor to parse arbitrary users' input into standard control instructions. Experiments show that our method significantly improves the accuracy of prompt-based length control on popular summarization datasets like CNNDM and NYT under multiple control types. Moreover, both the standard prompt extractor and RL-tuned model show strong generalization to unseen control prompt templates.
Prompt-Based Length Controlled Generation with Reinforcement Learning
Jie, Renlong, Meng, Xiaojun, Shang, Lifeng, Jiang, Xin, Liu, Qun
Large language models (LLMs) like ChatGPT and GPT-4 have attracted great attention given their surprising performance on a wide range of NLP tasks. Length controlled generation of LLMs emerges as an important topic, which enables users to fully leverage the capability of LLMs in more real-world scenarios like generating a proper answer or essay of a desired length. In addition, the autoregressive generation in LLMs is extremely time-consuming, while the ability of controlling this generated length can reduce the inference cost by limiting the length. Therefore, we propose a prompt-based length control method to achieve high-accuracy length controlled generation. In particular, we adopt reinforcement learning with the reward signal given by either trainable or rule-based reward models, which further enhances the length-control ability of LLMs by rewarding outputs that follows pre-defined control instruction. To enable rule-based inference, we also introduce standard prompt extractor to collect the standard control information from users' input. Experiments show that our method significantly improves the accuracy of prompt-based length control for summarization task on popular datasets like CNNDM and NYT. Both the standard prompt extractor and the RL-tuned model have show strong generalization ability to unseen control prompt templates.
- Research Report (1.00)
- Overview (0.68)
Implementation of general formal translators
The general translator formalism and computing specific implementations are proposed. The implementation of specific elements necessary to process the source and destination information within the translators are presented. Some common directives or instructions, such as classes and procedures, were unified and generalized in order to allow general translations implementations. In order to cover general cases, two levels of processing are required, related to the source and destination information appropriate transformations, with the related control and processing instructions. The proposed general translator elements are useful for processing natural or artificial information described through any types of languages or systems.
- North America > United States > Pennsylvania (0.04)
- Europe > Romania > Nord-Est Development Region > Iași County > Iași (0.04)
Reconfigurable co-processor architecture with limited numerical precision to accelerate deep convolutional neural networks
Wijeratne, Sasindu, Jayaweera, Sandaruwan, Dananjaya, Mahesh, Pasqual, Ajith
Convolutional Neural Networks (CNNs) are widely used in deep learning applications, e.g. visual systems, robotics etc. However, existing software solutions are not efficient. Therefore, many hardware accelerators have been proposed optimizing performance, power and resource utilization of the implementation. Amongst existing solutions, Field Programmable Gate Array (FPGA) based architecture provides better cost-energy-performance trade-offs as well as scalability and minimizing development time. In this paper, we present a model-independent reconfigurable co-processing architecture to accelerate CNNs. Our architecture consists of parallel Multiply and Accumulate (MAC) units with caching techniques and interconnection networks to exploit maximum data parallelism. In contrast to existing solutions, we introduce limited precision 32 bit Q-format fixed point quantization for arithmetic representations and operations. As a result, our architecture achieved significant reduction in resource utilization with competitive accuracy. Furthermore, we developed an assembly-type microinstructions to access the co-processing fabric to manage layer-wise parallelism, thereby making re-use of limited resources. Finally, we have tested our architecture up to 9x9 kernel size on Xilinx Virtex 7 FPGA, achieving a throughput of up to 226.2 GOp/S for 3x3 kernel size.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Asia > Sri Lanka (0.04)